{
  "cells": [
    {
      "cell_type": "raw",
      "metadata": {
        "vscode": {
          "languageId": "raw"
        }
      },
      "source": [
        "---\n",
        "sidebar_label: Unstructured\n",
        "sidebar_class_name: node-only\n",
        "---"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# UnstructuredLoader\n",
        "\n",
        "```{=mdx}\n",
        "\n",
        ":::tip Compatibility\n",
        "\n",
        "Only available on Node.js.\n",
        "\n",
        ":::\n",
        "\n",
        "```\n",
        "\n",
        "This notebook provides a quick overview for getting started with `UnstructuredLoader` [document loaders](/docs/concepts/document_loaders). For detailed documentation of all `UnstructuredLoader` features and configurations head to the [API reference](https://api.js.langchain.com/classes/langchain_community_document_loaders_fs_unstructured.UnstructuredLoader.html).\n",
        "\n",
        "## Overview\n",
        "### Integration details\n",
        "\n",
        "| Class | Package | Compatibility | Local | [PY support](https://python.langchain.com/docs/integrations/document_loaders/unstructured_file) | \n",
        "| :--- | :--- | :---: | :---: |  :---: |\n",
        "| [UnstructuredLoader](https://api.js.langchain.com/classes/langchain_community_document_loaders_fs_unstructured.UnstructuredLoader.html) | [@langchain/community](https://api.js.langchain.com/modules/langchain_community_document_loaders_fs_unstructured.html) | Node-only | ✅ | ✅ |\n",
        "\n",
        "## Setup\n",
        "\n",
        "To access `UnstructuredLoader` document loader you'll need to install the `@langchain/community` integration package, and create an Unstructured account and get an API key.\n",
        "\n",
        "### Local\n",
        "\n",
        "You can run Unstructured locally in your computer using Docker. To do so, you need to have Docker installed. You can find the instructions to install Docker [here](https://docs.docker.com/get-docker/).\n",
        "\n",
        "```bash\n",
        "docker run -p 8000:8000 -d --rm --name unstructured-api downloads.unstructured.io/unstructured-io/unstructured-api:latest --port 8000 --host 0.0.0.0\n",
        "```\n",
        "\n",
        "### Credentials\n",
        "\n",
        "Head to [unstructured.io](https://unstructured.io/api-key-hosted) to sign up to Unstructured and generate an API key. Once you've done this set the `UNSTRUCTURED_API_KEY` environment variable:\n",
        "\n",
        "```bash\n",
        "export UNSTRUCTURED_API_KEY=\"your-api-key\"\n",
        "```\n",
        "\n",
        "### Installation\n",
        "\n",
        "The LangChain UnstructuredLoader integration lives in the `@langchain/community` package:\n",
        "\n",
        "```{=mdx}\n",
        "import IntegrationInstallTooltip from \"@mdx_components/integration_install_tooltip.mdx\";\n",
        "import Npm2Yarn from \"@theme/Npm2Yarn\";\n",
        "\n",
        "<IntegrationInstallTooltip></IntegrationInstallTooltip>\n",
        "\n",
        "<Npm2Yarn>\n",
        "  @langchain/community @langchain/core\n",
        "</Npm2Yarn>\n",
        "\n",
        "```"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Instantiation\n",
        "\n",
        "Now we can instantiate our model object and load documents:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {},
      "outputs": [],
      "source": [
        "import { UnstructuredLoader } from \"@langchain/community/document_loaders/fs/unstructured\"\n",
        "\n",
        "const loader = new UnstructuredLoader(\"../../../../../../examples/src/document_loaders/example_data/notion.md\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Load"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Document {\n",
            "  pageContent: '# Testing the notion markdownloader',\n",
            "  metadata: {\n",
            "    filename: 'notion.md',\n",
            "    languages: [ 'eng' ],\n",
            "    filetype: 'text/plain',\n",
            "    category: 'NarrativeText'\n",
            "  },\n",
            "  id: undefined\n",
            "}\n"
          ]
        }
      ],
      "source": [
        "const docs = await loader.load()\n",
        "docs[0]"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "{\n",
            "  filename: 'notion.md',\n",
            "  languages: [ 'eng' ],\n",
            "  filetype: 'text/plain',\n",
            "  category: 'NarrativeText'\n",
            "}\n"
          ]
        }
      ],
      "source": [
        "console.log(docs[0].metadata)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Directories\n",
        "\n",
        "You can also load all of the files in the directory using [`UnstructuredDirectoryLoader`](https://api.js.langchain.com/classes/langchain.document_loaders_fs_unstructured.UnstructuredDirectoryLoader.html), which inherits from [`DirectoryLoader`](/docs/integrations/document_loaders/file_loaders/directory):\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {},
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "Unknown file type: Star_Wars_The_Clone_Wars_S06E07_Crisis_at_the_Heart.srt\n",
            "Unknown file type: test.mp3\n"
          ]
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "directoryDocs.length:  247\n",
            "Document {\n",
            "  pageContent: 'Bitcoin: A Peer-to-Peer Electronic Cash System',\n",
            "  metadata: {\n",
            "    filetype: 'application/pdf',\n",
            "    languages: [ 'eng' ],\n",
            "    page_number: 1,\n",
            "    filename: 'bitcoin.pdf',\n",
            "    category: 'Title'\n",
            "  },\n",
            "  id: undefined\n",
            "}\n"
          ]
        }
      ],
      "source": [
        "import { UnstructuredDirectoryLoader } from \"@langchain/community/document_loaders/fs/unstructured\";\n",
        "\n",
        "const directoryLoader = new UnstructuredDirectoryLoader(\n",
        "  \"../../../../../../examples/src/document_loaders/example_data/\",\n",
        "  {}\n",
        ");\n",
        "const directoryDocs = await directoryLoader.load();\n",
        "console.log(\"directoryDocs.length: \", directoryDocs.length);\n",
        "console.log(directoryDocs[0])\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## API reference\n",
        "\n",
        "For detailed documentation of all UnstructuredLoader features and configurations head to the API reference: https://api.js.langchain.com/classes/langchain_community_document_loaders_fs_unstructured.UnstructuredLoader.html"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": []
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "TypeScript",
      "language": "typescript",
      "name": "tslab"
    },
    "language_info": {
      "codemirror_mode": {
        "mode": "typescript",
        "name": "javascript",
        "typescript": true
      },
      "file_extension": ".ts",
      "mimetype": "text/typescript",
      "name": "typescript",
      "version": "3.7.2"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
}